false information
HauntAttack: When Attack Follows Reasoning as a Shadow
Ma, Jingyuan, Li, Rui, Li, Zheng, Liu, Junfeng, Xia, Heming, Sha, Lei, Sui, Zhifang
Emerging Large Reasoning Models (LRMs) consistently excel in mathematical and reasoning tasks, showcasing remarkable capabilities. However, the enhancement of reasoning abilities and the exposure of internal reasoning processes introduce new safety vulnerabilities. A critical question arises: when reasoning becomes intertwined with harmfulness, will LRMs become more vulnerable to jailbreaks in reasoning mode? To investigate this, we introduce HauntAttack, a novel and general-purpose black-box adversarial attack framework that systematically embeds harmful instructions into reasoning questions. Specifically, we modify key reasoning conditions in existing questions with harmful instructions, thereby constructing a reasoning pathway that guides the model step by step toward unsafe outputs. We evaluate HauntAttack on 11 LRMs and observe an average attack success rate of 70\%, achieving up to 12 percentage points of absolute improvement over the strongest prior baseline. Our further analysis reveals that even advanced safety-aligned models remain highly susceptible to reasoning-based attacks, offering insights into the urgent challenge of balancing reasoning capability and safety in future model development.
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England (0.04)
- Europe > France (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Overview (0.93)
- Law Enforcement & Public Safety (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government > Military (0.88)
Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks
The rapid spread of misinformation on digital platforms threatens public discourse, emotional stability, and decision-making. While prior work has explored various adversarial attacks in misinformation detection, the specific transformations examined in this paper have not been systematically studied. In particular, we investigate language-switching across English, French, Spanish, Arabic, Hindi, and Chinese, followed by translation. We also study query length inflation preceding summarization and structural reformatting into multiple-choice questions. In this paper, we present a multilingual, multi-agent large language model framework with retrieval-augmented generation that can be deployed as a web plugin into online platforms. Our work underscores the importance of AI-driven misinformation detection in safeguarding online factual integrity against diverse attacks, while showcasing the feasibility of plugin-based deployment for real-world web applications.
- Oceania > Australia (0.05)
- North America > United States > New York (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Media > News (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
MultiHoax: A Dataset of Multi-hop False-Premise Questions
Shafiei, Mohammadamin, Saffari, Hamidreza, Moosavi, Nafise Sadat
As Large Language Models are increasingly deployed in high-stakes domains, their ability to detect false assumptions and reason critically is crucial for ensuring reliable outputs. False-premise questions (FPQs) serve as an important evaluation method by exposing cases where flawed assumptions lead to incorrect responses. While existing benchmarks focus on single-hop FPQs, real-world reasoning often requires multi-hop inference, where models must verify consistency across multiple reasoning steps rather than relying on surface-level cues. To address this gap, we introduce MultiHoax, a benchmark for evaluating LLMs' ability to handle false premises in complex, multi-step reasoning tasks. Our dataset spans seven countries and ten diverse knowledge categories, using Wikipedia as the primary knowledge source to enable factual reasoning across regions. Experiments reveal that state-of-the-art LLMs struggle to detect false premises across different countries, knowledge categories, and multi-hop reasoning types, highlighting the need for improved false premise detection and more robust multi-hop reasoning capabilities in LLMs.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom (0.04)
- Europe > France (0.04)
- (22 more...)
- Research Report (1.00)
- Personal > Honors (1.00)
- Leisure & Entertainment > Sports > Olympic Games (1.00)
- Leisure & Entertainment > Sports > Soccer (0.93)
- Education (0.93)
- Media > News (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
OpenAI files countersuit against Elon Musk's 'bad faith' attacks
OpenAI has filed a countersuit against Elon Musk, accusing him of staging press attacks and malicious campaigns on "the social media platform he controls," as well as of making "harassing legal claims" and a "sham bid for OpenAI's assets." In its filing, courtesy of TechCrunch, the ChatGPT-maker said Musk could not tolerate seeing such "success for an enterprise he had abandoned and declared doomed" and had made it his own project to take down the organization. It also said that Musk's efforts have ramped up in recent months after it announced its plans to restructure and become a for-profit entity with a non-profit division. Last year, Musk sued OpenAI, accusing it of ditching its nonprofit mission, becoming a "closed-source de facto subsidiary" Microsoft and of violating its foundational agreement to develop generative AI "for the benefit of humanity." But Musk, OpenAI said in its new lawsuit, is only pretending to represent the public and in truth is seeking to stop it from restructuring.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Google edits Super Bowl ad for AI that featured false information
Google has edited an advert for its leading artificial intelligence (AI) tool, Gemini, before its broadcast during the Super Bowl after it was found to contain false information about gouda cheese. The local commercial, which advertises how people can use "AI for every business", showcases Gemini's abilities by depicting the tool helping a cheesemonger in Wisconsin to write a product description, including the erroneous line that gouda accounts for "50% to 60% of global cheese consumption". However, a blogger posted on X that the stat was an "AI hallucination" that is "unequivocally false", as more reliable data suggests the Dutch cheese is probably less popular than cheddar or mozzarella. The blogger Nate Hake added: "I found the above AI slop example in 20 minutes, and on the first Super Bowl ad I tried factchecking." Replying to him, the Google executive Jerry Dischler said this was not a "hallucination" – where AI systems invent untrue information – but rather a reflection of the fact the untrue information is contained in the websites that Gemini scrapes.
How 2024 made Elon Musk the world's most powerful unelected man
I've been pondering screen-time and isolation after I suffered through a recent bout of Covid. Even a few days of seclusion coupled with lengthy, uninterrupted spates of staring at screens were enough to return me to the state of mind in which I spent most of 2020. I hope all of you reading have a wonderful winter and new year, filled with the opposite of that experience: family, friends, and cheery, in-person parties. Today in Techscape: We look back at the biggest tech story of 2024, Elon Musk, and at the Amazon workers strike in the US. The biggest tech story of the year is Elon Musk's rise to omnipresence and an unprecedented level of global power.
- Europe > United Kingdom (0.70)
- Oceania > Australia (0.16)
- Europe > Ukraine (0.15)
- (14 more...)
- Law (1.00)
- Information Technology (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Evaluating the Propensity of Generative AI for Producing Harmful Disinformation During an Election Cycle
Generative Artificial Intelligence offers a powerful tool for adversaries who wish to engage in influence operations, such as the Chinese Spamouflage operation and the Russian Internet Research Agency effort that both sought to interfere with recent US election cycles. Therefore, this study seeks to investigate the propensity of current generative AI models for producing harmful disinformation during an election cycle. The probability that different generative AI models produced disinformation when given adversarial prompts was evaluated, in addition the associated harm. This allows for the expected harm for each model to be computed and it was discovered that Copilot and Gemini tied for the overall safest performance by realizing the lowest expected harm, while GPT-4o produced the greatest rates of harmful disinformation, resulting in much higher expected harm scores. The impact of disinformation category was also investigated and Gemini was safest within the political category of disinformation due to mitigation attempts made by developers during the election, while Copilot was safest for topics related to health. Moreover, characteristics of adversarial roles were discovered that led to greater expected harm across all models. Finally, classification models were developed that predicted disinformation production based on the conditions considered in this study, which offers insight into factors important for predicting disinformation production. Based on all of these insights, recommendations are provided that seek to mitigate factors that lead to harmful disinformation being produced by generative AI models. It is hoped that developers will use these insights to improve future models.
- Asia (0.34)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (4 more...)
- Media > News (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Malinowski in the Age of AI: Can large language models create a text game based on an anthropological classic?
Hoffmann, Michael Peter, Fillies, Jan, Paschke, Adrian
Recent advancements in Large Language Models (LLMs) like ChatGPT and GPT-4 have shown remarkable abilities in a wide range of tasks such as summarizing texts and assisting in coding. Scientific research has demonstrated that these models can also play text-adventure games. This study aims to explore whether LLMs can autonomously create text-based games based on anthropological classics, evaluating also their effectiveness in communicating knowledge. To achieve this, the study engaged anthropologists in discussions to gather their expectations and design inputs for an anthropologically themed game. Through iterative processes following the established HCI principle of 'design thinking', the prompts and the conceptual framework for crafting these games were refined. Leveraging GPT3.5, the study created three prototypes of games centered around the seminal anthropological work of the social anthropologist's Bronislaw Malinowski's "Argonauts of the Western Pacific" (1922). Subsequently, evaluations were conducted by inviting senior anthropologists to playtest these games, and based on their inputs, the game designs were refined. The tests revealed promising outcomes but also highlighted key challenges: the models encountered difficulties in providing in-depth thematic understandings, showed suspectibility to misinformation, tended towards monotonic responses after an extended period of play, and struggled to offer detailed biographical information. Despite these limitations, the study's findings open up new research avenues at the crossroads of artificial intelligence, machine learning, LLMs, ethnography, anthropology and human-computer interaction.
- Europe > Germany > Berlin (0.05)
- North America > United States > Oregon (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- (5 more...)
- Media (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
MAPX: An explainable model-agnostic framework for the detection of false information on social media networks
Condran, Sarah, Bewong, Michael, Kwashie, Selasi, Islam, Md Zahidul, Altas, Irfan, Condran, Joshua
The automated detection of false information has become a fundamental task in combating the spread of "fake news" on online social media networks (OSMN) as it reduces the need for manual discernment by individuals. In the literature, leveraging various content or context features of OSMN documents have been found useful. However, most of the existing detection models often utilise these features in isolation without regard to the temporal and dynamic changes oft-seen in reality, thus, limiting the robustness of the models. Furthermore, there has been little to no consideration of the impact of the quality of documents' features on the trustworthiness of the final prediction. In this paper, we introduce a novel model-agnostic framework, called MAPX, which allows evidence based aggregation of predictions from existing models in an explainable manner. Indeed, the developed aggregation method is adaptive, dynamic and considers the quality of OSMN document features. Further, we perform extensive experiments on benchmarked fake news datasets to demonstrate the effectiveness of MAPX using various real-world data quality scenarios. Our empirical results show that the proposed framework consistently outperforms all state-of-the-art models evaluated. For reproducibility, a demo of MAPX is available at \href{https://github.com/SCondran/MAPX_framework}{this link}
- North America > United States (0.14)
- Oceania > Australia (0.05)
- Europe > Switzerland (0.04)
- Asia > Middle East > Iran (0.04)
- Research Report > Promising Solution (0.54)
- Research Report > New Finding (0.34)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
Understanding Knowledge Drift in LLMs through Misinformation
Fastowski, Alina, Kasneci, Gjergji
Large Language Models (LLMs) have revolutionized numerous applications, making them an integral part of our digital ecosystem. However, their reliability becomes critical, especially when these models are exposed to misinformation. We primarily analyze the susceptibility of state-of-the-art LLMs to factual inaccuracies when they encounter false information in a QnA scenario, an issue that can lead to a phenomenon we refer to as *knowledge drift*, which significantly undermines the trustworthiness of these models. We evaluate the factuality and the uncertainty of the models' responses relying on Entropy, Perplexity, and Token Probability metrics. Our experiments reveal that an LLM's uncertainty can increase up to 56.6% when the question is answered incorrectly due to the exposure to false information. At the same time, repeated exposure to the same false information can decrease the models uncertainty again (-52.8% w.r.t. the answers on the untainted prompts), potentially manipulating the underlying model's beliefs and introducing a drift from its original knowledge. These findings provide insights into LLMs' robustness and vulnerability to adversarial inputs, paving the way for developing more reliable LLM applications across various domains. The code is available at https://github.com/afastowski/knowledge_drift.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (3 more...)
- Media > News (0.61)
- Government > Military (0.47)